# INT8 quantization
Mistral Small 3.1 24B Instruct 2503 Quantized.w8a8
Apache-2.0
This is an INT8-quantized Mistral-Small-3.1-24B-Instruct-2503 model, optimized by Red Hat and Neural Magic, suitable for fast response and low-latency scenarios.
M
RedHatAI
833
2
Mistral Small 24B Instruct 2501 Quantized.w8a8
Apache-2.0
A Mistral instruction fine-tuned model with 24B parameters after INT8 quantization, significantly reducing GPU memory requirements and improving computational throughput.
Large Language Model
Safetensors Supports Multiple Languages
M
RedHatAI
158
1
Deepseek R1 Distill Qwen 32B Quantized.w8a8
MIT
INT8 quantized version of DeepSeek-R1-Distill-Qwen-32B, reducing VRAM usage and improving computational efficiency through weight and activation quantization.
Large Language Model
Transformers

D
neuralmagic
2,324
9
Qwen2.5 7B Instruct Quantized.w8a8
Apache-2.0
INT8 quantized version of Qwen2.5-7B-Instruct, suitable for multilingual scenarios in both commercial and research applications, optimized for memory requirements and computational throughput.
Large Language Model
Safetensors English
Q
RedHatAI
412
1
BAAI Bge M3 Int8
MIT
The ONNX INT8 quantized version of BAAI/bge-m3, suitable for dense retrieval tasks, and optimizes compatibility with Vespa Embedding.
Text Embedding
Transformers

B
libryo-ai
1,007
1
Vit Base Patch16 224 Int8 Static Inc
Apache-2.0
This is an INT8 PyTorch model statically quantized using Intel® Neural Compressor post-training, based on Google's ViT model fine-tuning, significantly reducing model size while maintaining high accuracy.
Image Classification
Transformers

V
Intel
82
1
Ibert Roberta Large
I-BERT is a pure integer-quantized version of RoBERTa-large, using INT8 to store parameters and integer operations for inference, achieving up to 4x inference acceleration.
Large Language Model
Transformers

I
kssteven
45
0
Featured Recommended AI Models